Sequential recommendation is an important task to predict the next-item to access based on a sequence of interacted items. Most existing works learn user preference as the transition pattern from the previous item to the next one, ignoring the time interval between these two items. However, we observe that the time interval in a sequence may vary significantly different, and thus result in the ineffectiveness of user modeling due to the issue of \emph{preference drift}. In fact, we conducted an empirical study to validate this observation, and found that a sequence with uniformly distributed time interval (denoted as uniform sequence) is more beneficial for performance improvement than that with greatly varying time interval. Therefore, we propose to augment sequence data from the perspective of time interval, which is not studied in the literature. Specifically, we design five operators (Ti-Crop, Ti-Reorder, Ti-Mask, Ti-Substitute, Ti-Insert) to transform the original non-uniform sequence to uniform sequence with the consideration of variance of time intervals. Then, we devise a control strategy to execute data augmentation on item sequences in different lengths. Finally, we implement these improvements on a state-of-the-art model CoSeRec and validate our approach on four real datasets. The experimental results show that our approach reaches significantly better performance than the other 11 competing methods. Our implementation is available: https://github.com/KingGugu/TiCoSeRec.
translated by 谷歌翻译
This paper is a technical overview of DeepMind and Google's recent work on reinforcement learning for controlling commercial cooling systems. Building on expertise that began with cooling Google's data centers more efficiently, we recently conducted live experiments on two real-world facilities in partnership with Trane Technologies, a building management system provider. These live experiments had a variety of challenges in areas such as evaluation, learning from offline data, and constraint satisfaction. Our paper describes these challenges in the hope that awareness of them will benefit future applied RL work. We also describe the way we adapted our RL system to deal with these challenges, resulting in energy savings of approximately 9% and 13% respectively at the two live experiment sites.
translated by 谷歌翻译
从单眼RGB图像中重建3D手网络,由于其在AR/VR领域的巨大潜在应用,引起了人们的注意力越来越多。大多数最先进的方法试图以匿名方式解决此任务。具体而言,即使在连续录制会话中用户没有变化的实际应用程序中实际上可用,因此忽略了该主题的身份。在本文中,我们提出了一个身份感知的手网格估计模型,该模型可以结合由受试者的内在形状参数表示的身份信息。我们通过将提出的身份感知模型与匿名对待主题的基线进行比较来证明身份信息的重要性。此外,为了处理未见测试对象的用例,我们提出了一条新型的个性化管道来校准固有的形状参数,仅使用该受试者的少数未标记的RGB图像。在两个大型公共数据集上进行的实验验证了我们提出的方法的最先进性能。
translated by 谷歌翻译
最近,视觉变压器及其变体在人类和多视图人类姿势估计中均起着越来越重要的作用。将图像补丁视为令牌,变形金刚可以对整个图像中的全局依赖项进行建模或其他视图中的图像。但是,全球关注在计算上是昂贵的。结果,很难将这些基于变压器的方法扩展到高分辨率特征和许多视图。在本文中,我们提出了代币螺旋的姿势变压器(PPT)进行2D人姿势估计,该姿势估计可以找到粗糙的人掩模,并且只能在选定的令牌内进行自我注意。此外,我们将PPT扩展到多视图人类姿势估计。我们建立在PPT的基础上,提出了一种新的跨视图融合策略,称为人类区域融合,该策略将所有人类前景像素视为相应的候选者。可可和MPII的实验结果表明,我们的PPT可以在减少计算的同时匹配以前的姿势变压器方法的准确性。此外,对人类360万和滑雪姿势的实验表明,我们的多视图PPT可以有效地从多个视图中融合线索并获得新的最新结果。
translated by 谷歌翻译
估计每个视图中的2D人类姿势通常是校准多视图3D姿势估计的第一步。但是,2D姿势探测器的性能遭受挑战性的情况,例如闭塞和斜视角。为了解决这些挑战,以前的作品从eMipolar几何中的不同视图之间导出点对点对应关系,并利用对应关系来合并预测热插拔或特征表示。除了后预测合并/校准之外,我们引入了用于多视图3D姿势估计的变压器框架,其目的地通过将来自不同视图的信息集成信息来直接改善单个2D预测器。灵感来自先前的多模态变压器,我们设计一个统一的变压器体系结构,命名为输送,从当前视图和邻近视图中保险。此外,我们提出了eMipolar字段的概念来将3D位置信息编码到变压器模型中。由Epipolar字段引导的3D位置编码提供了一种有效的方式来编码不同视图的像素之间的对应关系。人类3.6M和滑雪姿势的实验表明,与其他融合方法相比,我们的方法更有效,并且具有一致的改进。具体而言,我们在256 x 256分辨率上只有5米参数达到人类3.6米的25.8毫米MPJPE。
translated by 谷歌翻译
The dual-encoder has become the de facto architecture for dense retrieval. Typically, it computes the latent representations of the query and document independently, thus failing to fully capture the interactions between the query and document. To alleviate this, recent work expects to get query-informed representations of documents. During training, it expands the document with a real query, while replacing the real query with a generated pseudo query at inference. This discrepancy between training and inference makes the dense retrieval model pay more attention to the query information but ignore the document when computing the document representation. As a result, it even performs worse than the vanilla dense retrieval model, since its performance depends heavily on the relevance between the generated queries and the real query. In this paper, we propose a curriculum sampling strategy, which also resorts to the pseudo query at training and gradually increases the relevance of the generated query to the real query. In this way, the retrieval model can learn to extend its attention from the document only to both the document and query, hence getting high-quality query-informed document representations. Experimental results on several passage retrieval datasets show that our approach outperforms the previous dense retrieval methods1.
translated by 谷歌翻译
Human reading comprehension often requires reasoning of event semantic relations in narratives, represented by Event-centric Question-Answering (QA). To address event-centric QA, we propose a novel QA model with contrastive learning and invertible event transformation, call TranCLR. Our proposed model utilizes an invertible transformation matrix to project semantic vectors of events into a common event embedding space, trained with contrastive learning, and thus naturally inject event semantic knowledge into mainstream QA pipelines. The transformation matrix is fine-tuned with the annotated event relation types between events that occurred in questions and those in answers, using event-aware question vectors. Experimental results on the Event Semantic Relation Reasoning (ESTER) dataset show significant improvements in both generative and extractive settings compared to the existing strong baselines, achieving over 8.4% gain in the token-level F1 score and 3.0% gain in Exact Match (EM) score under the multi-answer setting. Qualitative analysis reveals the high quality of the generated answers by TranCLR, demonstrating the feasibility of injecting event knowledge into QA model learning. Our code and models can be found at https://github.com/LuJunru/TranCLR.
translated by 谷歌翻译
半监督学习在医疗领域取得了重大进展,因为它减轻了收集丰富的像素的沉重负担,用于针对语义分割任务。现有的半监督方法增强了利用从有限标记数据获得的现有知识从未标记数据提取功能的能力。然而,由于标记数据的稀缺性,模型提取的特征在监督学习中受到限制,并且对未标记数据的预测质量也无法保证。两者都将妨碍一致培训。为此,我们提出了一种新颖的不确定性感知计划,以使模型自动学习地区。具体而言,我们采用Monte Carlo采样作为获得不确定性地图的估计方法,该方法可以作为损失损失的重量,以强制根据监督学习和无监督学习的特征将模型专注于有价值的区域。同时,在后退过程中,我们通过增强不同任务之间的梯度流动,联合无监督和监督损失来加速网络的融合。定量地,我们对三个挑战的医疗数据集进行了广泛的实验。实验结果表明,最先进的对应物的理想改善。
translated by 谷歌翻译
在对任何预测或政策分析的模拟多变量时间序列中,已经有利于数据中的损失关系。然而,回归分析通常用于相关关系,很少少数研究专注于因果区发现的方差分析。我们首先使用虚构的传染媒介自回归模型建立抵抗效果关系的均衡。在均衡中,从噪声识别长期关系,并且杂散的关系忽略于零。溶液称为因果分布,测量导致所有系列或特定受影响的相对强度。如果一组外源性数据影响其他数据但不反之亦然,那么,在理论上,其他变量的因果分布必须为零。零因果关系的假设试验是决定变量的规则是内源性的。我们的新方法在识别模拟研究中的数据之间的真实损失关系方面具有高精度。我们还应用估算因果因素对气候变化贡献的方法。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译